Rectangular Attribute Cardinality Map: A New Histogram-like Technique for Query Optimization
نویسندگان
چکیده
Current database systems utilize histograms to approximate frequency distributions of attribute values of relations. These are used to efficiently estimate query result sizes and access plan costs. Even though they have been in use for nearly two decades, there has been no significant mathematical techniques (other than those used in statistics for traditional histogram approximations) to study them. In this paper, we introduce a new histogram-like approximation strategy, called the Rectangular Attribute Cardinality Map (R-ACM), that aims to approximate the density of the underlying attribute values using the philosophies of numerical integration. In this new histogram-like approximation method, the density function within a given sector is approximated by a rectangular cell, where the height of the cell is obtained so as to guarantee that the actual probability density differs from the approximated one by a maximum of a userspecified tolerance, . Furthermore, unlike the two traditional histogram types, namely equi-width and equi-depth, the R-ACM is neither equi-width nor equi-depth. Analytically, we show that for the R-ACM, the distribution of an attribute value within the sector is Binomially distributed. This permits us to derive worst-case and average-case results for the estimation errors of the probability mass itself. Our theoretical results, which include a rigorous maximum likelihood and expected-case analyses, and an extensive set of experiments demonstrate that the R-ACM scheme (which is essentially histogram-like) is much more accurate than the traditional histograms for query result size estimation. Due to its high accuracy and low construction costs, we hope that it could become an invaluable tool for query op-
منابع مشابه
An Empirical Comparison of Histogram-Like Techniques for Query Optimization
We consider the problem of Query Optimization which consists of a database system choosing, among many diierent Query Evaluation Plans (QEP), the most economical one for a given query. Since the number of QEPs increases exponentially with the number of relations involving the query, query optimization is a very complex problem. Many estimation techniques have been developed in order to approxim...
متن کاملBenchmarking attribute cardinality maps for database systems using the TPC-D specifications
Benchmarking is an important phase in developing any new software technique because it helps to validate the underlying theory in the specific problem domain. But benchmarking of new software strategies is a very complex problem, because it is difficult (if not impossible) to test, validate and verify the results of the various schemes in completely different settings. This is even more true in...
متن کاملThe Efficiency of Histogram-like Techniques for Database Query Optimization
One of the most difficult tasks in modern day database management systems is information retrieval. Basically, this task involves a user query, written in a high-level language such as the Structured Query Language, and some internal operations, which are transparent to the user. The internal operations are carried out through very complex modules that decompose, optimize and execute the differ...
متن کاملSASH: A Self-Adaptive Histogram Set for Dynamically Changing Workloads
Most RDBMSs maintain a set of histograms for estimating the selectivities of given queries. These selectivities are typically used for costbased query optimization. While the problem of building an accurate histogram for a given attribute or attribute set has been well-studied, little attention has been given to the problem of building and tuning a set of histograms collectively for multidimens...
متن کاملQuery Selectivity Estimation Based on Improved V-optimal Histogram by Introducing Information about Distribution of Boundaries of Range Query Conditions
Selectivity estimation is a parameter used by a query optimizer for early estimation of the size of data that satisfies query condition. Selectivity is calculated using an estimator of distribution of attribute values of attribute involved in a processed query condition. Histograms built on attributes values from a database may be such representation of the distribution. The paper introduces a ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999